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DEPTH WEIGHTED SCATTER ESTIMATORS 

By Yijun Zuo 1 and Hengjian Cui 2 
Michigan State University and Beijing Normal University 

General depth weighted scatter estimators are introduced and in- 
vestigated. For general depth functions, we find out that these affine 
equivariant scatter estimators are Fisher consistent and unbiased for 
a wide range of multivariate distributions, and show that the sam- 
ple scatter estimators are strong and -^/n-consistent and asymptoti- 
cally normal, and the influence functions of the estimators exist and 
are bounded in general. We then concentrate on a specific case of 
the general depth weighted scatter estimators, the projection depth 
weighted scatter estimators, which include as a special case the well- 
known Stahel-Donoho scatter estimator whose limiting distribution 
has long been open until this paper. Large sample behavior, includ- 
ing consistency and asymptotic normality, and efficiency and finite 
sample behavior, including breakdown point and relative efficiency 
of the sample projection depth weighted scatter estimators, are thor- 
oughly investigated. The influence function and the maximum bias 
of the projection depth weighted scatter estimators are derived and 
examined. Unlike typical high-breakdown competitors, the projec- 
tion depth weighted scatter estimators can integrate high breakdown 
point and high efficiency while enjoying a bounded-influence function 
and a moderate maximum bias curve. Comparisons with leading es- 
timators on asymptotic relative efficiency and gross error sensitivity 
reveal that the projection depth weighted scatter estimators behave 
very well overall and, consequently, represent very favorable choices 
of affine equivariant multivariate scatter estimators. 

1. Introduction. The sample mean vector and sample covariance matrix 
have been the standard estimators of location and scatter in multivariate 
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statistics. They are affine equivariant and highly efficient for normal popu- 
lation models. They, however, are notorious for being sensitive to unusual 
observations and susceptible to small perturbations in data. M-estimators 
[Maronna (1976)] are the early robust alternatives which have reasonably 
good efficiencies while being resistant to small perturbations in the data. Like 
their predecessors, the M-estimators unfortunately are not globally robust in 
the sense that they have relatively low breakdown points in high dimensions. 
The Stahel-Donoho (S-D) estimator [Stahel (1981) and Donoho (1982)] is 
the first affine equivariant estimator of multivariate location and scatter 
which attains a very high breakdown point. The estimator has stimulated 
extensive research in seeking affine equivariant location and scatter estima- 
tors which possess high breakdown points. Though -y/n-consistent [Maronna 
and Yohai (1995)], the limiting distribution of the S-D estimator remained 
unknown until very recently. This drawback has severely hampered the es- 
timator from becoming more prevalent and useful in practical inference. 
The limiting distribution of the S-D (and general depth weighted) location 
estimator (s) has recently been discovered by Zuo, Cui and He (2004). Estab- 
lishing the limiting distribution (and studying other properties) of general 
depth weighted and (particularly) the S-D scatter estimators is one goal of 
this paper. 

In addition to the S-D estimator, affine equivariant estimators of multi- 
variate location and scatter with high breakdown points include the min- 
imum volume ellipsoid (MVE) and the minimum covariance determinant 
(MCD) estimators [Rousseeuw (1985)] and S-estimators [Davies (1987) and 
Lopuhaa (1989)]. A drawback to many classical high breakdown point esti- 
mators though is the lack of good efficiency at uncontaminated normal mod- 
els. Estimators which can combine good global robustness (high breakdown 
point and moderate maximum bias curve) and local robustness (bounded 
influence function and high efficiency) are always desirable. Proposing (and 
investigating) a class of such estimators is another goal of this paper. 

Breakdown point serves as a measure of global robustness, while the in- 
fluence function captures the local robustness of estimators. In between the 
two extremes comes the maximum bias curve. A discussion of the maximum 
bias curve of scatter estimators at population models (with unknown loca- 
tion), seemingly very natural and desirable, has not yet been seen in the 
literature, perhaps partially because of the complication and difficulty to 
derive it. Providing an account of the maximum bias of projection depth 
weighted scatter estimators is the third goal of this paper. 

To these ends, general depth weighted estimators are introduced and stud- 
ied. The S-D estimator is just a special case of these general estimators. The 
paper investigates the asymptotics of the general depth weighted scatter 
estimators. Sufficient conditions for the asymptotic normality and the ex- 
istence of influence functions of the general estimators are presented. They 
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are satisfied by common depth functions including Tukey halfspace [Tukey 
(1975)] and Liu simplicial [Liu (1990)] depth. The paper then specializes to 
the projection depth weighted scatter estimators and examines their large 
and finite sample behavior. The asymptotic normality of the S-D scatter es- 
timator follows as a special case. The influence function (together with the 
asymptotic relative efficiency) of the projection depth weighted scatter esti- 
mators is compared to those of some leading estimators. To fulfill the third 
goal of the paper, the maximum bias (under point-mass contamination) of 
the projection depth weighted scatter estimators for elliptical symmetric 
models is derived. 

Findings in the paper reveal that the S-D and the projection depth 
weighted scatter estimators possess good robustness properties locally (high 
efficiency and bounded influence function) and globally (high breakdown 
point and moderate maximum bias) and behave very well overall compared 
with the leading competitors and, thus, represent favorable choices of scatter 
estimators. 

The empirical process theory approach in the paper is useful for other 
depth applications. The treatment of the maximum bias of scatter estimators 
here sets a precedent for similar problems. 

The rest of the paper is organized as follows. Section 2 introduces gen- 
eral depth weighted scatter estimators and investigates their Fisher consis- 
tency, asymptotics and influence functions. Section 3 is devoted to a specific 
case of the general depth weighted scatter estimators, the projection depth 
weighted scatter estimators. Here, sufficient conditions introduced in Section 
2 for asymptotics and influence functions are verified and the corresponding 
general results are also concretized. Furthermore, the asymptotic relative 
efficiency, the influence function and the gross error sensitivity of the es- 
timators are derived and compared with those of leading estimators. The 
maximum bias curve (under point-mass contamination) of the estimators is 
also derived and examined. Finally, the finite sample behavior of the esti- 
mators, including breakdown point and relative efficiency, is investigated. 
Simulation results with contaminated and uncontaminated data confirm the 
validity of the asymptotic properties at finite samples. The paper ends in 
Section 4 with some concluding remarks. Selected (sketches of) proofs and 
auxiliary lemmas are saved for the Appendix. 

2. General depth weighted scatter estimators. Depth functions can be 
employed to extend the univariate L-functionals (L-statistics) to the multi- 
variate setting [Liu (1990) and Liu, Parelius and Singh (1999)]. For exam- 
ple, one can define a depth-weighted mean based on a given depth function 
D{x,F) as follows [Zuo, Cui and He (2004)]: 

(1) L{F)= f xw x (D{x,F))dF{x)/ j w 1 (D(x,F))dF(x), 
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where w\(-) is a suitable weight function [wi and D are suppressed in L(-) 
for simplicity]. Subsequently, a depth- weighted scatter estimator based on 
D(x,F) can be defined as 



where u>2(-) is a suitable weight function that can be different from wi(-). 
L(-) and S(-) include multivariate versions of trimmed means and covariance 
matrices. The latter are excluded in later discussion though for technical 
convenience. To ensure well-defined L(F) and S(F), we require 



where || • || stands for the Euclidean norm. The first part of (3) holds au- 
tomatically for typical weight and depth functions and the second part be- 
comes trivial if Ji7||^r|| 2 < oo or if wi , i = 1, 2, vanishes outside some bounded 
set. Replacing F with its empirical version F n , we obtain L{F n ) and S(F n ) 
as empirical versions of L(F) and S(F), respectively. L(-) and S(-) distin- 
guish themselves from other leading estimators such as MVE- and MCD-, 
S-, M- and CM-estimators in the sense that L(-) is defined independently of 
S(-). They are also different from the ones in Lopuhaa (1999) since no prior 
location and scatter estimators are needed to define themselves. With the 
projection depth function PD(-,-) (see Section 3), L(-) and S(-) include as 
special cases the well-known Stahel-Donoho location and scatter estimators, 
respectively. 

In addition to PD(-, •), common choices of D(-, •) include the Tukey (1975) 
halfspace depth function, HD(x,F) = m£{P(H) :H a closed halfspace, x G 
H}, and the Liu (1990) simplicial depth function, SD(x, F) = P(x G S[X\, . . . , Xj+i]), 
where X\, . . . , Xd+i is a random sample from F and S[x±, . . . ,Xd+i] de- 
notes the (i-dimensional simplex with vertices x\, . . . ,Xd+i- Weighted or 
trimmed means based on the latter two depth functions were considered in 
Liu (1990), Diimbgen (1992) and Masse (2004). For all these depth functions, 
L(-) and S(-) are affine equivariant, that is, L(FAx+b) = AL(F) + b, and 
S(F^x+b) — AS(F)A' for any d x d nonsingular matrix A and vector b G M. d . 
In fact, this is true for any affine invariant D(-,-) [i.e., D(Ax + 6, FAx+b) = 
D(x,F)]. With such D(-, •) and for F centrally symmetric about 9 G M rf [i.e., 
Fx-e(-) = Fe-x(-)], L(F) is Fisher consistent [L(F) = 9] and L(F n ) is un- 
biased for 9 if EX < oo [Zuo, Cui and He (2004)]. This turns out to be true 
also for S(F) and S(F n ). That is, for a broad class of symmetric distributions 
F (including as special cases elliptically symmetric F) with £J||X|| 2 < +oo, 




(3) 




i = 1,2 
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S(F) = kCov(X) and E(S(F n )) = K n Cov(X), for some positive constants 
k and K n (with K n — > k as n — > oo). 

L(F) and L{F n ) have been studied in Zuo, Cui and He (2004) and Zuo, 
Cui and Young (2004) with respect to robustness and large and finite sample 
behavior. This current paper focuses on S(F) and S(F n ). Throughout the 
paper, we assume that < D(x,F) < 1 and D(-,-) is continuous in x and 
translation invariant, that is, D(x + b, Fx+b) = D(x, F) for the given F and 
for any 6el d . 

^Jn- consistency and asymptotic normality. Define 

H n {-) = \fn{D{-,F n ) — D(-,F)), py «, = sup \H n (x)\. 

For a given F, denote D r = {x: D(x, F) > r} for < r < 1. Let w\ be the 
derivative of Wi for i = 1,2. A function <?(•) on [a, b] is said to be Lipschitz 
continuous if there is some C > such that \g(s) — g(t)\ < C\s — t\ for any 
s,t € [a, b\. For < ro < 1, define the conditions: 

(Al) \\HnWoo = O p (l) and sup x£Dro ||a;|||F„(a;)| = O p (l). 

(A2) Wi(r), i = 1,2, is continuously differentiable on [0, 1] and on [0, aro] 

for some a > 1, is Lipschitz continuous on [0,1], ^^(O) = 0, 

and f Dro \\x\\\w^\D(x,F))\ dF(x) < oo. 

In light of Vapnik-Cervonenkis classes and the CLT for empirical pro- 
cesses [Pollard (1984) and van der Vaart and Wellner (1996)], it is seen that 
the first part of (Al) holds for common D(-, •) such as HD(-, ■) and SD(-, •). 
The first part of (A2) holds automatically for smooth Wi such as 

, , Wi (r) = ((exp(-K(l - (r/C) 2l f) - exp(-K))/(l - exp(-K)))/(r < C) 
{> +I(r>C), 

with parameters < C < 1 and K > and indicator function /(•) (here 
ro = 0), % = 1,2, which will be used later. Note that (A2) excludes the 
trimmed means and covariance matrices with indicator functions as Wi. 
This, however, allows us to impose fewer and less severe conditions on F 
and D (■,■). The second part of (Al) or (A2) holds with any ro > for com- 
mon depth functions, in virtue of their "vanishing at infinity" property [Liu 
(1990) and Zuo and Serfling (2000a, b)], that is, lim^^ D(x,F) = 0. In 
Section 3 we show that (Al) and (A2) hold for PD(-, ■) with ro = 0. 

Theorem 2.1. Under (Al) and (A2), S(F n ) - S(F) = O p {l/y/n). 

The (strong) consistency of S(F n ) can be established similarly based on 
corresponding conditions. Hereafter, we omit the (strong) consistency discus- 
sion. To establish the asymptotic normality of S(F n ), we need the following 
conditions. Denote f n (-) = \fn(F n (-) — F(-)). 
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(A3) J DrQ \\x\\ 2i (w i (D(x,F))fdF(x)<oo,J Dro ||x||>f \d(x, F))\ dF(x) < 

OO, 2 = 1,2. 

(A4) H n (x) = j h(x,y)dv n (y) + o p (l) uniformly on S n C D ro , P{D ro - S n } = 

o(l), for some h and J(J WyW^w^ (D(y,F))h(y,x)\dF(y)) 2 dF(x) < 
oo, i = 1,2, and {h(x, •) :x £ S n } is a Donsker class. 

Note that with a positive ro, (A3) holds automatically for depth func- 
tions vanishing at infinity. (A4) holds for HD and SD with any positive ro 
[Diimbgen (1992) and Masse (2004)] and other depth functions. For details 
on a Donsker class of functions, see van der Vaart and Wellner (1996). In 
Section 3 we show that (A3)-(A4) hold for PD with ro = and smooth Wi 
[such as those in (4)], i = 1, 2. 

Let vec(-) be the operator which stacks the columns of a p x q matrix M = 
(niij) on the top of each other, that is, vec(M) = (mu, . . . , m p i, . . . , mi q , . . . , m pq )' . 
Let M\®M 2 be the Kronecker product of matrices M\ and M 2 . Let k s (-,F) = 
(• - - Li(F))' - S(F). Define for i = 1,2, 



(5) 



J xwi(D(x,F))dF(x) 



J Wi (D(x,F))dF(x) ' 
i^(x,F) = ( / (y-L i (F))w\ 1) (D(y,F))h(y,x)dF(y) 



(6) +(z-L i (F)H( J D(z, J F)) 



-i 



and 



K s (x,F) 

(7) = J fc a (y, F)4 1} F))%, a) rfF(y) + fc a (a, F) W2 (D(x, F)) 

Jw 2 (D(x,F))dF(x) 

Theorem 2.2. Under (Al)-(A4), we have 

1 71 /I 
5(F n ) - S(F) = - Y^iCpQ) - F(^(X))) + Op( - 

where K(-) = K S (-,F) - K±(-,F) (L 2 (F) — L (F) )' - (L 2 (F) - L(F))(Ki(-,F))' . 
Hence, 

v^(vec(5(F n )) - vec(S(F))) ± N#(0,V), 
where V is the covariance matrix of vec(K(X)). 
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The main ideas and the outline of the proof are as follows. The key prob- 
lem is to approximate 

Iin = Vn(^J hi(x)wi(D(x,F n ))dF n (x) 

hi(x)wi(D(x,F))dF(x)\ i = l,2, 



where h±(x) = x — L(F) or 1 and h,2(x) = k s (x,F) or 1. The difficulty lies 
in the first integrand — it depends on F n . By differentiability of Wi, there is 
6i n (x) between D(x,F) and D(x,F n ) such that 

hi{x)wi{D(x,F))dv n {x) + / hi{x)wf \o in {x))H n (x) dF n (x). 



The CLT takes care of the first term on the right-hand side. Call the second 
term lf n . Then by (Al) and (A2), 

ll = j h i {x)wf ) {D{x 1 F))H n (x)dF n (x) + o p {l). 

Now by virtue of (A3) and (A4) (and, consequently, asymptotic tightness of 
H n ) and Fubini's theorem, 

4= f( I h i (x)wi l) (D(x,F))h(x,y)dF(x)) du n {y) + 0p (l). 



The desired results in Theorem 2.2 follow from the above arguments. See 
the Appendix for details. 

Influence function. Now we study the influence function of S(-). For a 
given distribution F in R d and an e > 0, the version of F contaminated by 
an e amount of an arbitrary distribution G in R d is denoted by F(e,G) = 
(1 — e)F + eG. The influence function of a functional T at a given point 
x € M. d for a given F is defined as [Hampel, Ronchetti, Rousseeuw and Stahel 
(1986)] 

IF(x;T,F) = hm (T(F(e,S x )) -T(F))/e, 

£->0+ 

where 5 X is the point-mass probability measure at x G M. d . IF(x;T,F) de- 
scribes the relative effect (influence) on T of an infinitesimal point-mass 
contamination at x, and measures the local robustness of T. An estimator 
with a bounded influence function (with respect to a given norm) is there- 
fore robust (locally, as well as globally) and very desirable. Define for any 
y£R d , 

H £ (x,y) = (D(x,F(e,5 y )) - D(x,F))/e, ||# e (y)||oo = sup \H £ (x,y)\. 
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If the limit of H e (x,y) exists as e — > + , then it is IF(y; D(x, F), F). In 
the following, we assume that IF(y;D(x,F),F) exists. The latter is true 
for the halfspace [Romanazzi (2001)], the projection [Zuo, Cui and Young 
(2004)], the weighted LP [Zuo (2004)] and Mahalanobis depth (MD) func- 
tions. To establish the influence function of S(-), we need the following con- 
dition, a counterpart of (Al). Denote by O y (l) a quantity which may depend 
on y but is bounded as e — > 0. 

(Al') \\H £ (y)\\ 00 = O y (l) axidsup xeDro \\x\\\H E (x,y)\=O y (l). 

Condition (Al') holds for HD and weighted LP depth with a positive 
ro and for PD and MD with ro = 0. Replace h(y,x) in (6) and (7) by 
IF(x; D(y, F), F) and call the resulting functions Ki(x,F), i = 1,2, and 
K s (x,F), respectively. We have the following: 

Theorem 2.3. Under (Al') and (A2), 

IF(y; S, F) = K s (y, F) - K x (y, F)(L 2 (F) - L(F))' 
-(L 2 (F)-L(F))(K 1 (y,F)) / . 

For smooth W{, i = 1,2, the gross error sensitivity of S ': 7* (5, F) = sup ygIK d |||i\F(y; £, F)|||, 
where (and hereafter) "||| • |||" stands for a selected matrix norm, is bounded if 
ro > 0. If ro = 0, it is also bounded if sup^ gM d \\y t Wi(D(y, F))\\ < 00, i = 1,2. 
The latter is true for PD and MD and suitable Wi, i = l,2 [such as those in 
(4)]- 

Note that the set D ro in this section could be replaced by any bounded 
set containing D ro or the whole space M. d , depending on the application. 
The latter case corresponds to ro = 0. When ro > 0, by (A2), Wi(r) = 0, 
i = 1,2, for r in a neighborhood of 0, corresponding to a depth trimmed 
(and weighted) L(F) and S(F) and a bounded D ro for any D(-, •) vanishing 
at infinity. 

This section provides a general mechanism for establishing the asymp- 
totics and the influence function of general depth weighted scatter estima- 
tors. Some of the sufficient conditions presented here might be slightly weak- 
ened in some minor aspects (e.g., for w\ Lipschitz continuity suffices). Also 
note that results in Theorems 2.2 and 2.3 become much simpler if w\ = W2 
or if F is centrally symmetric since L%{F) = L(F) in these cases. 

3. Projection depth weighted and Stahel Donoho scatter estimators. This 
section is specialized to the specific case of the general depth weighted scatter 
estimators, the projection depth weighted or Stahel-Donoho scatter estima- 
tors. 
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Let [i and a be univariate location and scale functionals, respectively. The 
•projection depth of a point x £ W 1 with respect to a given distribution F of 
a random vector X E PD(x,F), is defined as [Zuo and Serfling (2000a) 
and Zuo (2003)] 

(8) PD(x,F) = 1/(1 + 0(x,F)), 

where the outlyingness 0(x,F) = sup» u » =1 (u'x — fi(F u ))/ a (F u ), and F u is 
the distribution of u'X . Throughout our discussions \x and a are assumed 
to exist for the univariate distributions involved. We also assume that \i 
and a are afflne equivariant, that is, fi(F s Y+ c ) = sfJ,(Fy) + c and a(F S Y+ c ) = 
\s\o~(Fy), respectively, for any scalars s and c and random variable Y £ 
K. Replacing F with its empirical version F n based on a random sample 
Xi,..., X n , an empirical version PD(x, F n ) is obtained. With fi and a being 
the median (Med) and the median absolute deviation (MAD), respectively, 
Liu (1992) first suggested the use of PD(x,F n ) as a depth function. For 
motivation, examples and related discussion of (8), see Zuo (2003). 

To establish the asymptotics and influence function of the projection 
depth weighted scatter estimators, some conditions on \i and a are needed. 
Denote by F nu the empirical distribution function of {u'Xi,i = 1, . . . ,n} for 
any unit vector uGRf 

(Bl) sup|| n || =1 \n(F u )\ <oo, sup|| u || =1 a(F u ) < oo and inf|| u || =1 a(F u ) > 0. 
(B2) sup| M=1 \n(F nu ) - fi(F u )\ = O p (l/y/n), sup|| u | j=1 \a(F nu ) - a(F u )\ = 
O p (l/y/n). 

Conditions (Bl) and (B2) hold for common choices of (n,o~) and a wide 
range of distributions; see Remark 2.4 of Zuo (2003) for a detailed discussion 
[also see Zuo, Cui and He (2004)]. 

3.1. Large sample behavior and influence function. 

3.1.1. General distributions. 

y/n- consistency and asymptotic normality. Denote by PWS(-) a PD 
weighted scatter estimator. To establish the y^amsiste-ncy Q f PWS(F n ), 
we need the following lemma [Zuo (2003)]: 

Lemma 3.1. Under (Bl) and (B2), sup xmd (l + \\x\\)\PD(x, F n ) - PD(x, F)\ = 

By the lemma, (Al) holds for PD with ro = under (Bl) and (B2). For 
smooth uii, i = 1,2, (A2) also holds since sup^gj^d ||a;||PZ)(a;, F) < oo under 

(Bl) [see the proof of Theorem 2.3 of Zuo (2003)] and / \\x\\wtp (PD(x, F)) dF(x) < 
C f \\x\\PD(x,F) dF(x) < oo. These and Theorem 2.1 lead to the next the- 
orem. 
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Theorem 3.1. Assume that Wi (r) is continuous and w 2 (r) is Lips- 
chitz continuous on [0, 1], to^ (r) = O(r') for small r > 0, and J Wi(PD(x,F))dF(x) > 



0, i = l,2. Then under (Bl) and (B2), PWS(F n ) - PWS(F) = O p (l/y/n). 

Maronna and Yohai (1995) showed the -y/n-consistency of the S-D scatter 
estimator, a special case of PWS(F n ) (and with w\ = 11)2)- In Theorem 3.1 

w\ (r) = 0{r l ) for small r > can be relaxed to Wi(0) = and 102(0) = 0, 
2 = 1,2. Note that Wi in (4) can serve as to, in Theorem 3.1. 

For smooth Wi, i = 1, 2, in Theorem 3.1, it is readily seen that (A3) holds 
with ro = under (Bl). To establish the asymptotic normality of PWS(F n ), 
we need to verify (A4). For any x let u(x) be the set of unit vectors u satis- 
fying 0(x, F) = (u'x — fi(F u ))/a(F u ). If u(x) is a singleton, we also use u(x) 
as the unique direction. If X is a continuous random variable, nonuniqueness 
of u(x) may occur at finitely many points. Define the following conditions: 

(CI) n(F u ) and cr(F u ) are continuous in u, a(F u ) > 0, and u(x) is a singleton 
except for points x £ A C M d with P(^4) = 0. 

(C2) The asymptotic representations fi(F nu ) — ^{F u ) = y t J27=i /i(^«; n ) + 
ap(Wn) and a(F„ u ) - a(F u ) = ± EiLi / 2 (X<, «)+Op(l/v^) hold uni- 
formly for u, the graph set of {fj(X,u) : \\u\\ = 1} forms a polynomial 
set class with E(fj(X,u)) =0 for any \\u\\ = 1, 



and 



SUp \fj(X,U!) 
\ui—U2\<8 



sup fj(X,u) 



< +00 







as — ► 0, j = 1, 2. 



For details on polynomial set classes, see Pollard (1984). (CI) and (C2) 
hold for general M-estimators of location and scale and a wide range of 
distributions; see Zuo, Cui and He (2004) for further discussion. Under these 
conditions we obtain the following [Zuo, Cui and He (2004)]. 



Lemma 3.2. Under conditions (CI) and (C2), there exists a sequence of 
sets S n cM. d such that 1 — P{S n } = o(l) and H n (x) = Jh(x,y) dv n (y) +o p (l) 
uniformly over S n with 

(9) h(x,y) = (0(x,F)f 2 (y,u(x)) + fi(y,u(x)))/(a(F u[x) )(l + 0(x,F)) 2 ). 

Hence, for smooth Wi, i = 1,2, in Theorem 3.1, (A4) holds for PD under 
(Bl) and (CI) and (C2) with ro = [see Section 2.10.2 of van der Vaart and 
Wellner (1996) for the verification of a Donsker class]. In light of Theorem 
2.2 for general depth weighted scatter estimators, we have the following: 
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Theorem 3.2. For Wi, i = l,2, in Theorem 3.1 and under (Bl) and 
(B2) and (CI) and (C2), 

1 ™ 

PPTO(F n ) - PWS(F) = - V + a 

w/iere K(x) = K s (x, F) - K x (x, F) (L 2 (F) - L(F))' - (L 2 (F) - L(F)) x (Ki (x, F))' . 
Hence 

^n~(vec(PWS(F n )) - vec{PWS(F))) 4 N(0, V), 
where V is the covariance matrix of vec(K(X)). 

Influence function. Now we derive the influence function of the projec- 
tion depth weighted scatter matrices. First we need the following lemma 
[Zuo, Cui and Young (2004)]. 

Lemma 3.3. Assume that (CI) holds and the influence functions IF(u'y; fJ,,F u ) 
and IF(u ; y;a, F u ) exist and are continuous for a given y E M. d at u = u(x) 
which is a singleton. Then 

IF(y; PD(x,F),F) 
(10) = 0(xlF)IF((u(x))'y; a, F u(x) ) + IF((u(x))'y- /x, F u{x) ) 

a(F u(x) )(l + 0(x,F)y 

Condition (Bl) holds automatically under the conditions of this lemma 
and, consequently, it can be shown that (Al') holds with ro = 0. By Theo- 
rem 2.3 we have the next theorem. 

Theorem 3.3. Under the conditions of Lemma 3.3 and for smooth Wi, 
i = l,2, in Theorem 3.1, 

IF(y;PWS,F) = K s (y,F) - K^y, F)(L 2 (F) - L{F))' 

— (L2{F) — L(F))(Ki(y, F))' . 

The influence function IF(y; PWS, F) in Theorem 3.3 can be shown (de- 
tails skipped) to be uniformly bounded in y 6 ]R rf (with respect to a matrix 
norm). Thus, ^{PWS ,F) < oo. 

3.1.2. Elliptically symmetric distributions. Now we focus on elliptically 
symmetric F and (/U,cr) = (Med, MAD). X ~ Fq-% is elliptically symmetric 
about with a positive definite matrix £ associated if for any unit vec- 
tor u, u'(X — 9) = \/u'T,uY with Y = — Y, where "=" stands for "equal in 
distribution." First we have this lemma: 
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Lemma 3.4. Let MAD(y) = m,Q and the density p(y) ofY be continuous 
with p(0)p(mo) > 0. Then u(x) is a singleton except at x = 9, and (Bl) and 
(B2) and (CI) and (C2) hold with 

h{x,u) = Vu'£u(± _ j{ u >( x _ 0) < o}) / p (0), 

/afou) =V-u / Su(i-/{|n / (x-0)| < m Vu'i:it})/2p(mo). 

The main part of the proof is largely based on Cui and Tian (1994) and the 
details are skipped. Asymptotic normality (and consistency) of PWS(F n ) 
follows immediately from this lemma and Theorem 3.2. The covariance ma- 
trix V in Theorem 3.2 can be concretized. 

Asymptotic normality. Note that Z = S _1 / 2 (X — 9) ~ Fq is spherically 
symmetric about the origin and U = (U\, . . . , lid)' = Z/\\Z\\ is uniformly dis- 
tributed on the unit sphere {x € M. d ; \\x\\ = 1} and is independent of \\Z\\ 
[Muirhead (1982)]. Define 

s (x) = 1/(1 + x/m ), 

Si (x) = E(U^- 1] signfltfils - m )), i = 1,2, 
co = Ew 2 (s (\\Z\\)), 
c 1 = E(\\Z\\ 2 w 2 (s (\\Z\\)))/(dco), 

c i = i?(||Z|| 2 ^(||Z||)4 1) ( S o(||^||)))/(4m^(mo)), j = 2,3, 
h{x) = c 3 (s 2 (x) - (si(x) - s 2 (x))/ (d - 1)) + x 2 w 2 (s (x)), 
h(x) = c 3 (si(x) - s 2 {x))/(d - 1) - cic 2 si(x) - ciw 2 (s (x)), 
where (si(x) — s 2 (x))/(d — 1) is defined to be when d=l. 

Corollary 3.1. Under the condition of Lemma 3.4 and for Wi, i = l,2, 
in Theorem 3.1, 

PWS(F n ) - PWS(F) = - V K(Xi) + o p (-) 

i=i 

with K(X) = ^(hiWZlDUU' + t 2 (\\Z\\)I d )X l / 2 /co and 

^{vec(PWS(F n )) - vec{PWS(F) j) A jV(0, V) 

with V = o-\{I d 2 + Kdd){^ ® S) + o"2 vec(S) vec(S)', where o\ = l/(d(d + 
2)c 2 )Et 2 (\\Z\\), C x 2 = a 1 + ^(t 1 (||Z||)t 2 (||Z||)) + ^|(||Z||), andiQ,, z S 

a d 2 x d 2 -block matrix with -block being equal to 5ji, 5ji is a dx d-matrix 
which is 1 at entry and everywhere else, i,j = l,...,d. 
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Asymptotic relative efficiency. With asymptotic normality established 
above, we now are in a position to study the asymptotic relative efficiency 
of the scatter estimator PWS(F n ). We shall focus on its estimation of 
the "shape" of S, that is, its "shape component"; see Tyler (1983) and 
Kent and Tyler (1996) for detailed arguments. For a given shape measure 
cf), H(<j>;PWS,F) = fil^^PWSiF)^- 1 / 2 ) measures the shape (or bias) 
of PWS(F) with respect to S. It clearly is afhne invariant. One example of 
(j> is the likelihood ratio test statistic 4>o measuring the ellipticity (sphericity) 
of any positive definite T [see Muirhead (1982), also see Maronna and Yohai 
(1995)], 

(f) (T) = (trace(T)/d) d /det(T). 

For this (fro, nlog(H((/)o; PWS , F n )) has a limiting distribution. More gener- 
ally, we have the following: 

Theorem 3.4. Assume that scatter functional S(-) is affine equivari- 
ant and for elliptically symmetric Fg^, S(F) = cS for some c > and 

V^(vec(5(F n )) - vec(5(F))) £ N(0,V) with V = Sl (I d 2 + K d>d )(X <g> £) + 
S2vec(£) vec(S)', for some Si > 0, i = l,2. Then 

nlog(0 o (S~ 1/2 5(F n )S~ 1/2 )) 4 ^X(d-i)(d +2 )/2 as rw oo. 

The details of the proof are skipped, but the main ideas follows. 
By affine equivariance of S(-), assume S = I d . Then we can write S(F n ) = 
c{I d + n~ 1 / 2 Z / c) with N(0,V) as the asymptotic distribution of vec(Z), 
where Z = (zij). Now expand nlog(0o(S~ 1 / 2 S'(F n )S _1 / 2 )) and write 

nlog(^ (S- 1/2 5(F n )S- 1 / 2 )) = (traced 2 ) - (trace(Z)) 2 /d)/(2c 2 ) + O^n" 1 / 2 ) 

= z'Bz/c 2 + O p {n-^ 2 ), 

with z = (z u /\/2, z dd j \Z2,z 12 ,..., z ld , z 23 ,..., z {d _^ d )' and B = diag(/ d - 
11' /d, Id(d-i)/2)i where 1 = (l)dxi- Let A be the asymptotic covariance ma- 
trix of z. Then BAB = s\B. The desired result follows since the rank of B 
is (d— 1) X (d + 2)/2. For related discussion see Muirhead (1982). 

In light of Theorem 3.4, for PWS(F n ), = o~i, i = 1, 2, and c = c\ are given 
in Corollary 3.1; for the sample covariance matrix COV (F n ), c = 1 and s± = 
1 + k if Fep has kurtosis 3k [Tyler (1982)]. Clearly, the ratio cf (1 + k)/cti 
measures the asymptotic relative efficiency (ARE) of PWS{F n ) with respect 
to COV(F n ) at the given model Fq^- The same idea was employed in Tyler 
(1983) to compute AREs of scatter estimators. At the multivariate normal 
model, k = 0, hence the ratio c\j(j\ is the ARE of PWS(F n ) with respect to 
COV(F n ). 
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Consider Wi, i = 1,2, in (4). They are selected to meet the requirements 
in Theorem 3.1 and to down-weight exponentially less deep points to get 
better performance of PWS. Also, appropriate tuning of C and K can lead 
to highly efficient (and robust) PWS [see Zuo, Cui and He (2004) for related 
comments]. The behavior of W2 is depicted in Figure 1 with C = 0.32 and 
if = 0.2. 

Table 1 reports the AREs of PWS(F n ) [with respect to COV{F n )} versus 
the dimension d and selected C and K at iV(0, Id) with W2 above. Here we 
select C's that are close to Med(PD(X, F)) to get better performance of 
PWS. It is seen that PWS(F n ) possesses very high ARE for suitable K and 
C, which, in fact, approaches 100% rapidly as the dimension d increases. 
Note that the ARE of PWS{F n ) here does not depend on that of the un- 
derlying projection depth weighted mean (PWM). The ARE of the latter 
depends on w\ and behaves like that of PWS(F n ) [Zuo, Cui and He (2004)]. 



Influence function. Under the condition of Lemma 3.4, it can be shown 
that 



IF (u(x)'y, Med, F u(x) ) 
IF ( U (x)'y, MAD, F u{x) ) 



I £-1/2 



X 



2p(0)||£- 1 a; 

lis- 1 ^ 



4p(mo)||S l x 



z — -sign(|a/£ mo||5] 1 / 2 ; 



These functions are continuous at u{x) almost surely. By Lemmas 3.4 and 
3.3 we have 



IF(x;PD(y,F),F) 

2 y\\ 



S g(||E-V2 3 



to 



|E- 1 / 2 y||sdgn(|i/ / E- 1 ac| - toqHE- 1 / 2 ^) sign(y / £~ 1 x) 



Am p(m ) 



+ 



2p(0) 




0.0 0.2 0.4 



0.8 10 




0.0 0.2 



0,6 ftS 



Fig. 1. The behavior of w 2 (r) with C = 0.32 and K = 0.2. Left: w 2 (r). Right: w 2 1] {r 
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Table 1 







The asymptotic relative 


efficiency of PWS versus 


the dimension d 






C 


i 


C — 1 


C = \_ 


C — 1 


d 




l+Vd/*~ 1 (3/4) 


l+Vd/*- 1 (3/4) 




1+V2I 


K 


= 2 


K = 3 


K = 2 


K = 3 


2 




0.922 


0.883 


0.904 


0.862 


3 




0.957 


0.933 


0.945 


0.918 


4 




0.976 


0.959 


0.969 


0.945 


5 




0.980 


0.974 


0.979 


0.965 


6 




0.989 


0.980 


0.983 


0.974 


7 




0.990 


0.986 


0.986 


0.980 


8 




0.993 


0.991 


0.991 


0.985 


9 




0.994 


0.992 


0.992 


0.987 


10 




0.995 


0.993 


0.994 


0.980 


15 




0.998 


0.998 


0.996 


0.995 


20 




1.00 


0.999 


0.999 


0.997 


30 




1.00 


1.00 


1.00 


0.999 



By virtue of Theorem 3.3, we have the next corollary. 

Corollary 3.2. Under the condition of Lemma 3.4 and for u>i, i = l,2, 
in Theorem 3.1, 

/F(x;P^, J Fo,/J = (ti(||x||)xx7||x|| 2 + t 2 (||x||)/ d )/ Co , 

IF(x; PWS, F e>s ) = ^(IF^-V^x - 9); PWS ,F^ Id ))T}' 2 . 

Figure 2 indicates IF(x; PWS , Fq^) is uniformly bounded in x G W 1 rel- 
ative to a matrix norm. 

Maintaining a good balance between high efficiency and a bounded influ- 
ence function is always a legitimate concern for estimators. Many existing 




Fig. 2. The behavior of IF(x; PWS ', F ,i 2 ) with w 2 in (4). Left: -(1,1) entry. Right: 
— (1,2) entry. 
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Table 2 

The ARE and the gross error sensitivity index G2 of 
scatter (location) estimators 



d 


Estimator 


ARE 


i 




2 


t(CM> 


0.8670 


(0.9057) 


1.415 


(1.861) 




PWS 


0.8810 


(0.9152) 


1.318 


(1.818) 


5 


r(CM)- 


0.9099 


(0.9354) 


1.275 


(2.588) 




PWS 


0.9180 


(0.9516) 


1.057 


(2.546) 


10 


r(CM)- 


0.9505 


(0.9606) 


1.224 


(3.425) 




PWS 


0.9620 


(0.9734) 


0.979 


(3.421) 



high breakdown estimators fail to do so though. CM- [Kent and Tyler (1996)] 
and r- [Lopuhaa (1999)] estimators are among the few exceptions. In light 
of these papers, we consider a gross error sensitivity index for the shape of 
the scatter estimator S, 

G 2 (S, F) = GES(S, F)/((l + 2/d)(l - l/d) 1 ' 2 ), 

where GES(£, F) is the gross-error-sensitivity of S(F)/ trace(S(F)), the 
shape component of the scatter functional S(F). In our case it is seen that 
G 2 (PWS,F) =sup r > ti(r)/(c ((i + 2)). Table 2 reports the ARE and G 2 of 
scatter estimators (along with those of the corresponding location estimators 
listed in parentheses; in the location case G 2 = 7*) for d = 2, 5 and 10. 

Table 2 lists only the ARE and G 2 for r- and PWS estimators. The cor- 
responding indices for the CM-estimators are omitted since they are almost 
the same as those of the r-estimators. The indices for r(CM)-estimators are 
obtained by optimizing G 2 of the corresponding location estimators based 
on Tukey's biweight function [Kent and Tyler (1996) and Lopuhaa (1999)]. 
The weight function w 2 in (4) is employed in our calculation for the indices 
of PWS [and wi in (4) for PWM] with K = 3 and C = 1/(1 + V^dd), where 
£ 2 = 2.3, £5 = 1.2 and £ 10 = 0.9 for PWS [and £ d = 1.2 for PWM]. The values 
of C here are slightly different from those in Table 1 to get (nearly) optimal 
ARE and G 2 simultaneously. Inspecting Table 2 reveals that, compared with 
leading competitors, the projection depth weighted scatter estimator PWS 
behaves very well overall. 

Maximum bias. Define the maximum bias of a scatter matrix S under an 
e amount of contamination at F as B(e;S,F) = sup G \\\S(F(e,G)) — S(F)\\\, 
where G is any distribution in R rf . The contamination sensitivity of S at F 
is defined as -y(S,F) = lim e ^o+sup G f(S(F(e,G)) — S(F))/e\\\; see He and 
Simpson (1993) for a related definition for location estimators. B(e;S,F) 
is the maximum deviation (bias) of S under an e amount of contamination 
at F, and measures mainly the global robustness of S. 7(5, F) indicates 
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the maximum relative effect on S of an infinitesimal contamination at F, 
and measures the local, as well as global, robustness of S. The minimum 
amount e* of contamination at F which leads to an unbounded B(e;S,F) 
is called the (asymptotic) breakdown point (BP) of S at F, that is, e* = 
min{e : B(e; S, F) = 00} . 

In many cases, the maximum bias is attained by a point-mass distribution; 
see Huber (1964), Martin, Yohai and Zamar (1989), Chen and Tyler (2002) 
and Zuo, Cui and Young (2004). In the following, we derive the maximum 
bias and contamination sensitivity of the shape component of PWS under 
point-mass contamination. We conjecture that our results hold for general 
contamination. For any < e < 1/2 and c£l, define d\ = di(e), mi(c,e), 
i = l,2, by 

P(Y<d 1 (e)) 



P(\Y-c\ <mi(c,e)) 
P(\Y-c\ <m 2 (c,e)) 



2(1 -s) 
1 - 2e 

1 

2(I=i) 



(assume that di,mi,m2 are well defined). For x £ K d , write x' = (xi,x' 2 ) 
with xi = xn G R and x 2 = (%2li ■ ■ ■ i x 2(d-l))' ^ R d_1 - Likewise, partition 
the unit vector u £ R d . For any r > 0, define 



u 'l - itf ||x 2 || + \uixi - fa(ui,r,di] 

fi(x,r,£)= sup 



o<«i<i h(ui,r,d{) 

, , s |«ir- / 4 («i,r,di)| 

f 2 (r,e)= sup — ; — , 

o<«i<i h[ui,r,di) 

with f 3 (m , r, di ) being the median of {mi (/ 4 (iti , r, di) , e) , |«ir - / 4 (m , r, di) | , m 2 (/4 (1*1 , r, di) , e)}, 
/ 4 («i,r, di) being the median of {— di,-uir, di} (e is suppressed in ^3 and / 4 ). 



Define, for i 


= 1,2, 




= (1- 




= (1- 


1«(r,e) 


= (1- 


7t(r,e) 


= ewA 



\x,r,e\ 



dF (x), 
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b _ tpi (r, e) - ip 2 (r, e) + 72 (r, e)r 2 (^(r, e) + 71 (r, e)r) 2 
1 ' ?72 + 72(r,e) (r/i(r,e) +7i(r,e)) 2 

_ 2 <fti(r,e)0 2 (r, e) + Qi(r, e)7 2 (r, e) + 2 (r, 5)71 (r,g))r 
(r/i(r,e) +7i(r,e))(r? 2 (r,e) + 7 2 (r,e)) 

_ 2 -yi(r,e)^ 2 (r,e)r 2 

(771 (r,e) +7i(r, e))(?7 2 (r,£) + 72 (r, e)) ' 

b 2 (r,e) = tp2 (r, e) / (r/ 2 (r, e) + 72 (r, e) ) - ci . 

For any y G denote y = E _1 / 2 (y — 6). We have the next theorem: 

Theorem 3.5. Under the condition of Lemma 3.4 and for any e > 
and y £ R d , 

PWS(F(e,5 y ))- PWS(F) = Z 1 / 2 (b 1 (\\y\\,z)yy/\\y\\ 2 + b 2 (\\y\\,e)I d )Z 1 / 2 . 

For weight functions Wi, i = 1,2, in Theorem 3.1, it can be shown that for 
any e < 1/2, tvace(PWS(F(e,5 y )) - PWS(F)) is uniformly bounded with 
respect to y G M d . Hence we have the following: 

Corollary 3.3. Under the condition of Lemma 3.4 and for weight 
functions Wi, i = 1,2, in Theorem 3.1, £*(-PW / 5', i 7 ) = 1/2. 

Focusing again on the shape component of PWS and based on the result 
in Theorem 3.5, we can define in a straightforward fashion a gross error 
sensitivity index (GESI), a maximum bias index (MBI) and a contamination 
sensitivity index (CSI), respectively, as follows: 



lim6 1 (||y||, e )E 1 / 2 (^7||yf)E 1 / 2 /, 



GESl(PWS,F) = sup 
MBI(e;PWS,F) = sup |||6 1 (||y||, e )S 1 / 2 (yy7||y|| 2 )S 1 / 2 |||, 



CSI(PWS,F)= hm sup |||6i(||j/||,e)E 1/2 (yy , /||y|| 2 )S 1 /2/ e |||. 



In view of Corollary 3.2, it can be seen that GESl(PWS, F) = X 1 x 
sup r>0 \t\{r)\/cQ, which is < CSl(PWS , F), where Ai is the largest eigen- 
value of E. Note that under point-mass contamination the only difference 
between CSI and GESI is the order in which the suprema and the limits are 
taken in their respective definitions above. This might tempt one to believe 
that these two sensitivity indices are the same if it is taken for granted that 
the order in which the supremum and the limit are taken is interchangeable. 
Unfortunately, this is not always the case [see, e.g., Chen and Tyler (2002)]. 
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Fig. 3. The behavior of the maximum bias (index) of PWS and MAD. Left: maximum 
bias indices of PWS and MAD. Right: maximum biases of PWS and MAD. 



In the following, we prove that for PWS, the order is interchangeable and 
CSI{PWS, F) is the same as GESI(PWS, F). The proof and the derivation of 
the following result, given in the Appendix, is rather technically demanding 
and has no precedent in the literature. 

Theorem 3.6. Under the condition of Lemma 3.4 and for Wi, i = 1,2, 
in Theorem 3.1: 

(a) MBI(e; PWS, F) = \\ sup r > b\ (r, e) and 

(b) CSl(PWS,F) = GESl(PWS,F) = X 1 sup r > \t 1 (r)\/c . 

The behavior of MBI(e; PWS, N(0, h)) [and B(e; PWS , N(0, 1 2 ))], to- 
gether with that of the (explosion) maximum bias of MAD at iV(0, 1) — 
B{e; MAD, iV(0, 1)) (note that no separate shape and scale components cor- 
respond to MAD, a univariate scale measure), as functions of e is revealed 
in Figure 3. The slopes of the tangent lines at the origin represent the CSI 
(or 7) of PWS and MAD. From the figures we see that the maximum bias 
(index) of PWS is quite moderate (and slightly larger than that of the uni- 
variate scale measure MAD) and it increases very slowly as the amount of 
contamination e increases and jumps to infinity as 0.45 < e — > A, confirming 
that the asymptotic breakdown point of PWS is ^. 

3.2. Finite sample behavior. In this section the finite sample robustness 
and relative efficiency of PWS(F n ) are investigated. Finite sample results in 
this section confirm the asymptotic results in the last section. 

3.2.1. Finite sample breakdown point. Let X n = {X\, . . . , X n } be a sam- 
ple of size n from X in M. d (d> 1). The replacement breakdown point (RBP) 
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[Donoho and Huber (1983)] of a scatter estimator V at X n is defined as 

mnv, x») = min {= ; u^mvTO-' + WvTO) = «}, 

where is a contaminated sample resulting from replacing m points of 
X n with arbitrary values. 

In the following discussion of the RBP of the projection depth weighted 
scatter estimators, (fJ,,cr) = (Med,MADfc), where MAD*, is a modified MAD 
which can lead to a slightly higher RBP. Similar ideas of modifying MAD 
to achieve higher RBP were used in Tyler (1994) and Gather and Hilker 
(1997). Here MAD fc (x n ) = Med k ({\ Xl -Med(x n )\, . . . , \x„-Med(x n )\}), with 
Med k {x n ) = (x(L( n+fc )/2j) + Z(L(n+l+fc)/2j))/2, for 1 < k < n, and < ■ < 
Xt n ) being ordered values of x\, . . . ,x n in IR 1 (note MADi = MAD). Denote 
by PWS^ the corresponding scatter estimator. 

A random sample X n is said to be in general position if there are no more 
than d sample points of X n lying in any (d — l)-dimensional subspace. Let 
[•J be the floor function. We have the next theorem. 

Theorem 3.7. Let (fj,, a) = (Med, MAD) and PD(x,F) be the depth 
function. Let Wi(r) be continuous on [0,1] and positive and < M^r 1 on 
(0, 1] for some Mi > 0, i = 1,2. Then for X n in general position (n > 2d), 
RBP (PWS*,X n ) = min{L(n — k + 2) /2j /n, \ (n + k + l- 2d)/2\ jn). 

When k = d or d + 1, RBP(PWS*, X n ) = [(n-d + l) /2J /n, the upper 
bound of RBP of any affine equivariant scatter estimators; see Davies (1987). 
The RBP of the Stahel-Donoho scatter estimator, a special case of PWS^, 
has been given in Tyler (1994). Note that for the smooth Wi in (A2), Wi(r) < 
Mir 1 holds automatically, i = 1,2. The result in Theorem 3.7 holds true for 
any \i and a that share the RBPs of Med and MAD^, respectively. 

3.2.2. Finite sample relative efficiency. We generate 400 samples from 

the model (1 - e)N(0,I 2 ) +e*(ioo,o) with £ = °%> 10 % and 20 % for sample 
sizes n = 100, 200, . . . , 1000. An approximate algorithm with time complexity 
0(n 3 ) (for d = 2) is utilized for the computation of the PD n (Xi), i = 1, . . . , n, 
and the projection depth weighted scatter matrix, (/i, a) = (Med, MAD) and 
the weig ht functions «;<(•) defined in (4), with C= 1/(1 + v^/* _1 (3/4)) » 
0.323 and K = 2, are used in our simulation. 

We calculate for a scatter estimator V n the mean of the likelihood ra- 
tio test (LRT) statistic LRT(K) = ^E^i0o(^) with m = 400 and Vj 
being the estimate for the jth sample. In the case with e = 0% (no contam- 
ination), the mean of the nlog likelihood ratio test (LLRT) statistic with 
LLRT(V„) = — J2jLi n l°g( ( / ) o(^')) is calculated. The finite sample relative 
efficiency (RE) of V n at e = 0% is then obtained by dividing the LLRT of the 
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sample covariance matrix by that of V n [Maronna and Yohai (1995) used the 
same measure for finite sample relative efficiency]. Some simulation results 
are listed in Table 3. 

The finite sample RE of PWS{F n ) related to the sample covariance ma- 
trix at iV(0, 12) increases from about 80% for n = 20 to 91% for n = 100 
and is around 90%-93% and very stable for n = 100, 200, . . . , 1000 [and is 
very close to its asymptotic value 92.2% (listed in Table 1)]. In the contam- 
ination cases, the results in Table 3 indicate that PWS(F n ) is very robust, 
whereas COV{F n ) is very sensitive to outliers. For the special case of PWS n , 
the Stahel-Donoho estimator, a related simulation study was conducted by 
Maronna and Yohai (1995). 

Though alternatives exist, the W2 we select results in a very good perfor- 
mance of PWS n and satisfies all the requirements in the previous sections. 
Note that smaller C can lead to a higher RE of PWS n under no contam- 
ination, while larger C can lead to a better performance of PWS n under 
contamination. The same is true for the parameter K. Moderate values of 
C and K thus are recommended (and are used in our simulation); see Zuo, 
Cui and He (2004) for related discussion. 

4. Concluding remarks. General depth weighted scatter estimators are 
introduced and studied. The estimators possess nice properties. In a very 
general setting, consistency and asymptotic normality of the estimators are 
established and their influence functions are derived. These general results 
are concretized and demonstrated via the projection depth weighted scatter 
estimators. The latter estimators include as a special case the Stahel-Donoho 
estimator, the first one constructed which combines affine equivariant and 



Table 3 

Mean of the likelihood ratio test statistic and relative efficiency 



n 


PWS 


cov 


PWS 


COV 


PWS 


COV 


RE 

(e = 0%) 


e = 


0% 


e 


10% 


e = 


20% 


100 


1.022 


1.021 




234.09 


1.523 


420.80 


0.913 


200 


1.011 


1.010 


1 


231.03 


1.534 


407.10 


0.911 


300 


1.007 


1.006 


1.106 


230.04 


1.528 


405.72 


0.900 


400 


1.006 


1.005 


1.105 


227.79 


1.539 


404.13 


0.903 


500 


1.004 


1.004 


1.103 


227.18 


1.555 


404.43 


0.901 


600 


1.004 


1.003 


1.105 


227.26 


1.560 


404.78 


0.917 


700 


1.003 


1.003 


1.103 


227.37 


1.545 


403.20 


0.930 


800 


1.003 


1.002 


1.104 


226.28 


1.555 


404.00 


0.932 


900 


1.002 


1.002 


1.103 


226.27 


1.549 


401.45 


0.923 


1000 


1.002 


1.002 


1.102 


226.19 


1.543 


401.75 


0.926 
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high breakdown point, but has an unknown limiting distribution until this 
paper. 

Frequently high breakdown point affine equivariant estimators suffer from 
a low asymptotic relative efficiency and an unbounded influence function. 
The projection depth weight scatter estimators are proven to be exceptions. 
They combine the best possible breakdown point and a moderate maxi- 
mum bias curve (global robustness) and a bounded influence function (local 
robustness) and possess, in the meantime, a very high asymptotic relative 
efficiency at multivariate normal models. Simulations with clean and con- 
taminated data sets reveal that the global robustness and high efficiency 
properties hold at finite samples. 

Finally, we comment that the Wi in this paper do not include indicator 
functions. This allows us to treat general depth and distribution functions. 
To cover trimmed means (with indicator weight functions) , one has to impose 
more conditions on these functions (but the efficiency will be lower). 

APPENDIX: SELECTED (SKETCHES OF) PROOFS AND AUXILIARY 

LEMMAS 

PROOF of Theorem 2.1. Denote by h(F) and l 2 (F) the numerator 
and the denominator of L(F), respectively, and si(F) and s 2 (F) those of 
S(F), respectively. Write 



(11) L(F n ) - L(F) = ((h(F n ) - h(F)) - L(F)(l 2 (F n ) - l 2 (F)))/l 2 (F n ), 



withSo(F) = / xx'w 2 (D(x,F))dF(x)/s 2 (F). We now show that under (Al) and (A2), 





(12) 



- S (F)(s 2 (F n ) - S2 {F))/s 2 {F n ) 

- (L 2 (F n ) - L 2 (F))(L(F n ))' - L 2 (F)(L(F n ) - L(F))' 

- (L(F))(L 2 (F n ) - L 2 (F))' 

- (L(F n ) - L(F))(L 2 (F n ))' 

+ (L(F n ) - L(F))(L(F n ))' + L{F)(L{F n ) - L(F))' , 



(13) 




J \\x\\ l \ Wi {D{x,F n ))- Wi {D{x,F))\dF n {x) 
O p (l/v^), i = l,2. 
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By (A2), there exists a Oi n (x) between D(x,F n ) and D(x,F) such that for 
i = l,2, 

hn< J \\x\\ i \wl 1 \e in (x))-wl 1 \D(x,F))\^^dF n (x) 

Hjx) 



+ J \\ x \\^ 1 \D(x,F)) l -^ldF n (x). 



Call the two terms in the right-hand side and 1^ , respectively. Let n = 
ar . By (Al), £>(x, F) + su Pa;eRd |F(x,F n )-F(z,F)| = D(x,F)+O p (l/^E) > 
6 in (x). This and (A2) and (Al) lead to 

/«= / ||x|| 4 | W { 1) (M*))-w i (1) P(x,fO)|^^^n(a!) 
i{9 m (x)>n}UD ri V re 

J{D{x,F)+O p {l/yfti)>ri}UD rl \ sjn J \\ 

and 

4 2) = / ||x|K (1) (^,F ) )^PdF n( ,) = O p (^). 

Hence Ii n = O p (l/y/n). Likewise we can show that 
(14) / Wi(D(x,F n ))dF n (x)- f Wi (D{x,F))dF(x) = O p (l/y/E). 



Let h{x) = xx' , x or 1. It follows from displays (13) and (14) and the CLT 
that 

h(x)wi(D(x, F n )) dF n {x) - [ h(x)wi(D(x, F)) dF(x) = O p (l/yfc). 



By (11), the boundedness of L(F) and h(F), and the fact that hiFn) = 
h{F) + O p (l/y/n), we have L(F n ) — L{F) = O p (l/y/n). Likewise we have 
L 2 (F n ) - L(F) = Op(l/Vn). These, (12) and the boundedness of 5 (F), 
s 2 (F), L(F) and L 2 (F) yield S(F n ) - 5(F) = O p (l/v^). □ 

Proof of Theorem 2.2. Employing the notation in the proof of The- 
orem 2.1, write 



V™(^J xxw2{D{x,F n ))dF n {x) — J xxw2{D{x,F))dF{x) 

xx'w^\92n{x))H n (x) dF n (x) + / xx'w2{D(x,F))dv n (x), 
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where 2n {x) is a point between D(x, F n ) and D(x, F). Following the proof of 
Theorem 2.1 and by (Al)-(A4) (and, consequently, the asymptotic tightness 
of H n on S n ), we can show that 



xx'w^ (02n(x))H n (x) dF n (x) 

= J xx'w { 2 1) {D{x,F))H n {x)dF n {x)+o p {\) 
= J xx'w i 2 ) (D(x,F))H n (x)dF(x) + o p {l). 

Therefore, 

Vn(^f xx'w 2 (D(x,F n ))dF n (x) - J xx'w 2 (D(x,F)) dF(x)^j 

= J xx'w { 2 1] (D(x,F))H n (x)dF(x)+ J xx'w 2 (D(x,F))du n (x) + o p (l). 
By (A4) and Fubini's theorem, we have 



/n^y xx'w 2 (D(x, F n )) dF n {x) — J xx'w 2 (D(x, F)) dF{x) 

=J(J yy'w^ (D(y, F))h(y, x) dF(y) + xx'w 2 (D(x, F))j dv n {x) 



+ Op(l). 

Likewise, we can show that 



(16) 



n{s 2 {F n ) - s 2 (F)) 

n^J w 2 (D{x,F n ))dF n (x)- J w 2 (D(x, F)) dF(x) 

wg* (D(y, F))h(y, x) dF{y) + w 2 (D(x, F))) du n (x) 

+ Op(l), 



and for i = 1,2 [see the proof of Theorem 2.1 of Zuo, Cui and He (2004)], 

y/H(Li(F n ) - Li(F)) 

(y - Li(F))wW (D(y, F)h(y, x) dF(y) 



(17) +(x-Li(F))wi(D(x,F)))dv n (x) 



x || Wi (D(x,F))} 1 +o p (l). 
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Note that s 2 (F n ) = s 2 {F) + o p (l) and U{F n ) = L t {F) + o p (l), i = 1,2 (see 
the proof of Theorem 2.1). By (12) and (15)-(17), we have 

M(S(F n ))-(S(F))) 

(18) = J(K s (x,F)-K l (x,F)(L 2 (F)-L 1 (F))' 

- (L 2 (F) - L 1 (F))(K 1 (x,F))')du n (x) + o p (l). 

Note that vec(afo') = b ig) a for any a, 6 E R rf . The desired result now follows 
from the CLT. □ 

Proof of Theorem 2.3. The proof follows closely that of Theorem 
2.2 and is thus omitted. □ 

Proof of Corollary 3.1. By Theorem 3.2, K(x) =K s (x,F) since 
Li{F) = 6 for i = l,2. Assume without loss of generality that 9 = 0. For the 
given F and (^,cr), it follows that 

u(x) = E~ 1 x/\\T,~ 1 x\\, (x^O), 

°{ F u{x)) =m yu(x)'T,u(x), 0(x,F) = ||S~ 1//2 x||/m . 
Let u = z/\\z\\. Observe that 

Jxx'w 2 {PD{x,F))dF ^l 2 U zz'w 2 {s {\\z\\))dF Q )T}/ 2 



PWS{F) 



by, for example, Lemma 5.1 of Lopuhaa (1989). By Lemma 3.4, it follows 
that for any x, y G M d , 



Jw 2 (PD(x,F))dF Jw 2 (s (\\z\\))dF 
E(\\Z\\ 2 w 2 (so(\\Z\\)))^/ 2 ( f U 'udF )^ 1 / 2 /c = c 1 J: 



h(x,u(y))= ^y^K iMy^), 

2p{0) 

f ( ( w • n / v -i i iiv-i/2 in 

f2{x,u{y)) = — sign(|y£ x - m E 7 y ). 

4p(m ) 

Note that fi(x,u(y)) is an odd function of y. By Lemma 3.2, we have 
c K s (x,F) 

\yy'-c 1 Y,)w^\sv{\\^ l l 2 y\\))h{ y ,x)dF{y) 

+ (**'- Cl £)u; 2 ( S o(||£- 1/2 y||)) 

(yy> - Cl S)4 1} (aodlS-V^IDJOd,, F)f 2 (x,u(y)) 



a(F u{y) )(l + 0(y,F)) 2 
+ (xx' -c^w^soiW^-^xW)). 
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Let x = E _1 / 2 x, y = E _1 / 2 y, y/||y|| = u = (u%, . . . , Ud)' and T be an orthog- 
onal matrix with x/||x|| as its first column. We have 

c K s (x,F) - (xx' - Cl J:)w 2 (s (\\^ 2 x\\)) 

(^-c 1 S)4 1) ( go (||y||))||y|| S g(||^||)sign(|(0x|-m o ||y||) 

4mQp(mo) 

= S^ 1 / 2 y {(^ - Cl ^)^ 1 >( So (||^||))||^||^§(||^||) S ign(|(^)'^| - r^oll^N)} 

= ^ 2 t f{{y/\m'/\\y\\\\y\\ 2 -cih) 

x 4 1) ( s o(||y||))||y||so(||y||)sign(|ui|||x|| -m )} 
x{4m^(m )}- 1 dF (y)T / S 1 / 2 

= ^ 2 t(c 3 J nn , sign(|u 1 |||x||-m )(iF (y)- ClC2Sl (||x||)/^T / S 1 / 2 , 

by Theorem 1.5.6 of Murihead (1982). Note that 

Tc 3 /"Wsign(|«i|||£|| - m ) dF (y)T' 

= r C3 diag( S2 (||x||), s 2 (||x||), . . . , h(\\x\\))T' 

= csh(\\x\\)I d + cs(s 2 (\\x\\) - g 2 (||x||))^^, 

where s 2 (i) = / ti 2 sign(|ni|t — mo)dFo(y) = (si(t) — s 2 (i))/(d — 1). There- 
fore, we have 

^(X) = K s (X,F) = ls 1 /2^ l( ||l|| ) ^^_ + t2 (||x||)/ fi )s 1 /2. 

Now invoking Lemmas 5.1 and 5.2 of Lopuhaa (1989), we obtain the desired 
result. □ 

Proof of Theorem 3.5. We need the following lemma. Its proof is 
skipped. Note that F(e,5 y ) = (1 — e)F + e5 y and F u (e, 5 y ) = (1 — e)F u + e5 u / y 
for any unit vector u. 

Lemma 5.1. Suppose that X ~ F is elliptically symmetric about the ori- 
gin with a positive definite matrix £ associated. Let a(u) = y/u'Tiu. Then: 

1. Med(F u (e, 5 X )) = Med{— a(u)di(e), u'x, a(u)di(e)}, and 
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2. MAB(F u {e, 5 X )) = Med{a(n)m 1 (Med(F u (e, 5 x ))/a{u),e), \u'x-Med{F u (e, 5 x ))\,a{u)m 2 (Med{F u (e, 5 L 

We now turn to the proof of Theorem 3.5. By Lemma 5.1, for any y £ M. d , 
we have that 

/j,(F u (e , S y )) I a(u) = Med{— a{u)d\,v! y,a(u)d\} / a{u) 

= Med{-di(e), (^ 2 u)'/a(u)^-^ 2 y, di(e)}, 

a{u) I V a(u) J 

a(u) a(u) 



m 2 



/ n{F u {e,8 y )) ^ 
V a(«) 



Let u = S 1 / 2 u/a(?x), y = £ 1 / 2 y and x = £ 1 / 2 x. Then all the mappings are 
one-to-one and = 1. Denote f^{u,x,d{) = Med{— di,u'x,di}. Then 

0(x,F(e,S y )) 

v'x - fa{v,y,di) 

— sup 

H„H =1 Med{mi (f 5 (v, y, d{),s), \v'y - f 5 (v, y,di)\, m 2 (fe(v, y, di), e)} ' 

Let ?7 be an orthogonal matrix with y/||y|| as its first column, and U'v = v. 
Then / 5 (t;,si,d 1 ) = Med{-di,5i||jf||,di} = /4(ui,||jf||,di) and 0(x,F(e,6 y )) 
becomes 

sup{v'U'x-f4(vi, ||y||,di)} 

x {Med{mi(/ 4 (?T,||y||,di),E), 

l^i||y||-/4(^i,||y||,d 1 )|,m 2 (/4(^ 1 ,||y||,d 1 ), e )}r 1 
= sup fill's - U(vt, ||y||,di))// 3 (Ci, ||y||,di). 

It follows that 

xx'w 2 (PD{x, F(e, 5 y ))) dF(x) 
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Observe that 

u'x- f A (m, \\y\\,d{) 
\\u\\=i h{ui,\\y\\,di) 



u'^ + uxxi - fiiui, \\y\\,d{) 

sup sup — 

-!<«!<! || U2 || =v ^f / 3 («i,||y||,di) 



sup 



^l-u\\\x 2 \\ + \u\x\ - U(ui, \\y\\,di)\ 



o<m<i / 3 (ui, ||y||,di) 

= fi(x, \\y\\,e), 
which is an even function of x 2 - Hence, 

xx'w 2 (PD(x, F(e, S y ))) dF(x) 

+ f 1 (x,\\y\\,e)))U'X 1 / 2 dF (x) 

= ^(l|y||,e)/d + (^(||y||,e) - ^(ll»ll' e ))]|jf^f) El/a - 

Likewise, we can show that 

xwi(PD(x, F(e, Sy))) dF(x) 

= (y/\\y\\) J x 1 w t (l/(l + f 1 (x,\\y\\,e)))dF (x). 
Thus 

Li(F(e,5 y )) 

= {(y/\\y\\) ((1 - e) / x lWt (l/(l + h(x, \\yle))) dF (x) 

+ £ ||y||^(l/(l + / 2 (||y||,e))) 
x |(l- e ) J Wl {l/{l + h{x,\\yle)))dFv{x) 

+ ^(l/(l + / 2 (||y||, e )))| 1 

and 

PWS(F(e,5 y )) 

(1 - e) (M\\y\l^ + (iM||y||,e) - MMl e))± 
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n~ii2 ( 1 ^ y y' 

+ £ \\y\\ W 2 , TFhiFii 

Vl + /2(||y||,e)/ ||y|| IMI 
x{{l-e) fw 2 ( 1 ) rfFo(x) 



+ eu)2 



1 



-i 



.l + / 2 (||y||,e) 

- L\{F{e, ^j/))(-t'2(-^ 1 (e, <^)))' - L 2 (F(e, 5 y ))(Li(F(e, 5 y )))' 
+ L 1 (F(e,5 y ))(L 1 (F(e,5 y )))'. 
The desired result follows. □ 

Proof of Theorem 3.6. (a) is trivial. We now show (b). Assume, 
w.l.o.g. that 9 = 0. Since CSI(PWS,F) > GESl(PWS,F), we need to show 
that CSl(PWS,F) < GESl(PWS,F). Following the proof of Theorem 2.3 
and noting that Li(F(e, 5 y )) = Li(F) + o(l), i = 1,2, we can show that 

(PWS(F(e,5 y ))-PWS(F))/e 



xx'w^ (PD(x, F))H £ (x, y)F(dx) + yy'w 2 {PD(y, F)) 
x (J w 2 (PD{x,F))F(dx^j 

-PWS(F) (J w i 2 1) (PD(x,F))H £ (x,y)F(dx)+w 2 (PD(y,F)) 
J w 2 (PD(x,F))F(dx)^j +o(l), 



where o(-) is in the uniform sense with respect to y £ M. d . Following the 
proof of Theorem 3.5 of Zuo, Cui and Young (2004) and letting g(x, u, F) = 
(u,x- n{F u ))/a(F u ), we have 

xx'w { 2 l) (PD(x, F))H e (x, y)F(dx) + yy'w 2 (PD(y, F)) 



x (J w 2 (PD(x,F))F(dx)^j 



xx' w P(PD(x, F) fM^F)-g(x,u^)F(e,6 y )) 
s(x,M) e{l + 0(x,F)y 

+ yy'w 2 {PD{y,F)) 
J w 2 (PD(x,F))dF(x)} 1 +I 5 (M,y,e) + o(l), 
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where S(x, M) = {x : 1/M < ||S _1 / 2 a;|| < M} for a fixed M > 0, sup y6K d )E<0 . 5 || J 5 (M,y, 
as M -> oo and o(-) is in the uniform sense in y S M. d . Note that tt(x) = 
S-VIIS-^II and ( T(F u(:c) ) = mo||S- 1 /2 x ||/|| S -i x || for x ^ , / u(i ? u (x-)) = 

and 0(x,F) = ||S _1 / 2 rE||/mo. By Lemma 5.1 we see that /j,(F(e,5 y )) is odd 
in y. Therefore, 

xx'w^ (D(x, F))H £ {x, y)F{dx) + yy'w 2 (D(y, F)) 

i 



w 2 (D{x,F))F(dx) 

xx'w^isoiW^-^xWm^x] 



S{x,M) 



dF(x) + yy'w 2 {PD(y,F)) 



mle{l + 0{x,F)) 2 
J w 2 (PD(x,F))dF(x)} 1 + I 5 (M,y,e) + o(l), 

where o(-) is in the uniform sense with respect to y 6 M. d . Call the first term 
in the right-hand side of the last equality Iq = lQ(M,y,e). By Lemma 5.1, 



-Med 



S 1 x\\ { V a(u(x)) 

x'YT^y KF(e,S y )) 



WE-WxW a(u(x)) 



where fi(F(e,5 y ))/a(u(x)) = Med{-d 1 ,x'^~ 1 y/\\^ 1 / 2 x\\, d x }. Let x = E" 1 / 2 : 
y = S _1 / 2 y and T be an orthogonal matrix with y/||y|| as its first column. 

Note that T'X = X. Denote T x/||x|| = u = (ui, . . . ,Ud)' . Changing variables 
(x = S _1 / 2 x) and then taking an orthogonal transformation (with matrix T) 
and taking advantage of the independence of ||X|| and X/||X|| [see Lemma 
5.1 of Lopuhaa (1999)], we have 

\T}I 2 T [ ^^(^(llxll))^^^^!!) 17 ^'^ dF (x) 

Ji/m<\\x\\<m mfie 

xT'V l / 2 +yy'w 2 {s {\\ 

x \ I w 2 (s Q {\\x\\))dF Q (x ^ 
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E^T / \\x\\ 3 w£\s m\))4(m\)dF (x) 

Jl/M<\\x\\<M 

x / uu Mui ^ £) dF (x)T>xVA 

Ji/m<\\x\\<m m^e J 

x ij w 2 (s (\\x\\))dF (x^ 

+ yy'w 2 (so(\\y\\))/ I w 2 (s (\\x\\)) dF (x 



where I 7 (ux,y,e) = Med{m 1 (I 8 (u 1 ,y,e),e), |ui||y|| - Is(u 1 ,y,e)\,m2{h{u 1 ,y 1 e),£)}- 
mo and I$(ui,y,e) = Med{—di,ui\\y\\,di}. It can be shown (details are 
skipped) that 

I 7 (u 1 ,y,£)/e = sign(\ui\\\y\\ - m )/(4p(m )) + o(l), 

where o(l) — > uniformly in y £ M d as e — > 0. Following the proof of Corol- 
lary 3.1, we have 

_ S 1 / 2 r C3 (M)/ 1/A/ < ||a|| < M W S ign(|m|||si|| -mo)T'xV 2 
6 Jw 2 (s (\\x\\))dFo(x) 

yy'w 2 {so(\\y\\)) m 

+ Jw 2 (s (\\x\\))dFo(x)^ { > 

s 1 / 2 c 3 (A^)(g 2 (Hy||, iv^)z rf + ( g2 (||^||, ikr) - ^(ii^iu^j^/ii^H^/iiyiDs 1 / 2 

Jw 2 (so(\\x\\))dFo(x) 
+ yy'w 2 (s (\\y\\))/ Jw 2 (s (\\^))dF (x)+o(l), 
where o(l) is in the same sense as before. Further, 

c 3 (M)= / \\x\\ 3 w£\ So m\))4(\m)dFo(i), 

Jl/M<\\x\\<M 

s 2 (t,M)= / ulsign(\ui\t-mo)dF (x), 

Jl/M<\\x\\<M 

s 2 (t,M)= / u|sign(|«i|i — m )dF (x). 

Jl/M<\\x\\<M 

Therefore, 

(PWS(F(e,8 y ))-PWS(F))/e 

= ^ 2 -fL(c 3 (M)(s 2 (\\y\\,M) - ~s 2 (\\y\lM)) + \\yf W2 ( S0 (\\y\\)))^ 1/2 
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x (J w 2 (s (\\x\\))dF (x) 

+ h(M,y,e) + o(l) 

c 3 (M)s 2 (\\y\\,M) 



/^( S o(PII))dFo(£) 

fw^(D(x,F))H e (x,y)F(dx) + w 2 (D(y,F)) 



cr 



Jw 2 (D(x,F))F(dx) 



where again o(l) — > uniformly in y G R rf as e — > 0. From the definition of 
CSI, it follows that 



CSl(PMS,F) 



lim sup 

e->0+ ..cud 



< lim sup 

£-+0+ 



{^ 2 y/\\Mc 3 (M)(s 2 miM) - h(\\y\\,M)) 

+ \\y\\ 2 w 2 (s (\\y\\)))y/\\m 1/2 } 
x|| w 2 (s (\\x\\))dF (x)} +I 5 (M,y,e) + o{l) 

{^ 2 y/\\mc 3 (M)(s 2 (\\y\\,M)-U\\y\lM)) 

+ \\y\\ 2 w 2 (s (\mm'/\\m 1/2 } 

x ( / w 2 (s (\\x\\))dF (x^ 



+ lim sup \\\I 5 (M,y,e) 



< Ai sup 

r>0 



c 3 (M)(s 2 ( y r, M) - g 3 (r, M )) + rVa(«o(r)) 



/^( S o(PII))dF (£) 



+ lim sup |||I 5 (M,y,e)|||. 



Now letting M -c oo, we get CSI(PMS, F) < Ai sup r > |ti(r)|/co = GESI(PM5, F). 
□ 
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